South African Language Resources: Phrase Chunking
نویسنده
چکیده
Phrase chunking remains an important natural language processing (NLP) technique for intermediate syntactic processing. This paper describes the development of protocols, annotated phrase chunking data sets and automatic phrase chunkers for ten South African languages. Various problems with adapting the existing annotation protocols of English are discussed as well as an overview of the annotated datasets. Based on the annotated sets, CRF-based phrase chunkers are created and tested with a combination of different features, including part of speech tags and character n-grams. The results of the phrase chunking evaluation show that disjunctively written languages can achieve notably better results for phrase chunking with a limited data set than conjunctive languages, but that the addition of character n-grams improve the results for conjunctive languages.
منابع مشابه
Part-of-Speech Tagging and Chunking in Text-to-Speech Synthesis for South African Languages
Text-to-speech synthesis can be an empowering communication tool in the hands of the print-disabled or augmentative and alternative communication user. In an effort to improve the naturalness of synthesised speech – and thus enhance the communication experience – we apply the natural language processing tasks of part-of-speech tagging and chunking to the text in the synthesis process. We cover ...
متن کاملA Hybrid Model for Phrase Chunking Employing Artificial Immunity System and Rule Based Methods
Natural language Understanding (NLU), an important field of Artificial Intelligence (AI) is concerned with the speech and language understanding between human and computer. Understanding language means knowing what concept a word or phrase stands for and how to link them to form meaningful sentence. Identification of phrases or phrase chunking is an important step in natural language understand...
متن کاملIndexing Anatomical Phrases in Neuro-Radiology Reports to the UMLS 2005AA
This work describes a methodology to index anatomical phrases to the 2005AA release of the Unified Medical Language System (UMLS). A phrase chunking tool based on Natural Language Processing (NLP) was developed to identify semantically coherent phrases within medical reports. Using this phrase chunker, a set of 2,551 unique anatomical phrases was extracted from brain radiology reports. These ph...
متن کاملImproved Arabic Base Phrase Chunking with a new enriched POS tag set
Base Phrase Chunking (BPC) or shallow syntactic parsing is proving to be a task of interest to many natural language processing applications. In this paper, A BPC system is introduced that improves over state of the art performance in BPC using a new part of speech tag (POS) set. The new POS tag set, ERTS, reflects some of the morphological features specific to Modern Standard Arabic. ERTS expl...
متن کاملRapid Development of Nlp Modules with Memory-based Learning
The need for software modules performing natural language processing (NLP) tasks is growing. These modules should perform efficiently and accurately, while at the same time rapid development is often mandatory. Recent work has indicated that machine learning techniques in general, and memory-based learning (MBL) in particular, offer the tools to meet both ends. We present examples of modules tr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016